Robust continuous speech recognition using parallel model combination
نویسندگان
چکیده
This paper addresses the problem of automatic speech recognition in the presence of interfering noise. It focuses on the Parallel Model Combination (PMC) scheme, which has been shown to be a powerful technique for achieving noise robustness. Most experiments reported on PMC to date have been on small, 10-50 word vocabulary systems. Experiments on the Resource Management (RM) database, a 1000 word continuous speech recognition task, reveal compensation requirements not highlighted by the smaller vocabulary tasks. In particular, that it is necessary to compensate the dynamic parameters as well as the static parameters to achieve good recognition performance. The database used for these experiments was the RM speaker independent task with either Lynx Helicopter noise or Operation Room noise from the NOISEX-92 database added. The experiments reported here used the HTK RM recogniser developed at CUED modiied to include PMC based compensation for the static, delta and delta-delta parameters. After training on clean speech data,the performance of the recogniser was found to be severely degraded when noise was added to the speech signal at between 10dB and 18dB. However, using PMC the performance was restored to a level comparable with that obtained when training directly in the noise corrupted environment.
منابع مشابه
Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSpeech Emotion Recognition Using Scalogram Based Deep Structure
Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...
متن کاملResidual noise compensation for robust speech recognition in nonstationary noise
We present a model-based noise compensation algorithm for robust speech recognition in nonstationary noisy environments. The effect of noise is split into a stationary part, compensated by parallel model combination, and a time varying residual. The evolution of residual noise parameters is represented by a set of state space models. The state space models are updated by Kalman prediction and t...
متن کاملSpeech recognition in noisy environments using first-order vector Taylor series
Ž . In this paper, we generalize relations between clean and noisy speech signal using vector Taylor series VTS expansion Ž . for noise-robust speech recognition. We use it for both the noisy data compensation and hidden Markov model HMM parameter adaptation, and apply it for the cepstral domain directly, while Moreno used it to estimate the log-spectral parameters. Also, we develop a detailed ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IEEE Trans. Speech and Audio Processing
دوره 4 شماره
صفحات -
تاریخ انتشار 1996